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Abstract 

We study the design of polylogarithmic depth algorithms for approximately solving pack¬ 
ing and covering semidefinite programs (or positive SDPs for short). This is a natural SDP 
generalization of the well-studied positive LP problem. 

Although positive LPs can be solved in polylogarithmic depth while using only 0(log^ nje^) 
parallelizable iterations [4, 33], the best known positive SDP solvers due to Jain and Yao [18] 
require 0(log^^ parallelizable iterations. Several alternative solvers have been proposed 

to reduce the exponents in the number of iterations [19. 30). However, the correctness of the 
convergence analyses in these works has been called into question [30], as they both rely on 
algebraic monotonicity properties that do not generalize to matrix algebra. 

In this paper, we propose a very simple algorithm basec^ on the optimization framework 
proposed in [4] for LP solvers. Our algorithm only needs 0(log^n/e^) iterations, matching 
that of the best LP solver. To surmount the obstacles encountered by previous approaches, 
our analysis requires a new matrix inequality that extends Lieb-Thirring’s inequality, and a 
sign-consistent, randomized variant of the gradient truncation technique proposed in [3, 4]. 


The abstract of a previous version of this paper has appeared in the proceedings of SODA 2016. [1] 




1 Introduction 


Solvers for linear programs (LPs) and semidefinite programs (SDPs) are important algorithmic 
tools for many computational tasks, spanning the fields of computer science, operations research, 
statistics, and applied mathematics. Although polynomial-time generic solvers for LPs and SDPs 
have been known for a long time, their performance is often unsatisfactory in the big-data scenario. 

In the past two decades, a significant amount of attention has been paid towards a special class 
of LPs and SDPs, known as positive LPs [23] and positive SDPs [20] respectively. At a high level, 
positive LPs are characterized by non-negative variables and a non-negative constraint matrix; 
similarly, positive SDPs are described by positive semidefinite (PSD) matrix variables and a family 
of PSD matrices as constraints. In this paper, we are interested in solving positive SDPs, formally 
defined as follows. 

Positive SDP. Given m x m PSD matrices Ai, A 2 ,..., A„, positive SDP (after putting in its 
standard form) refers to the following pair of SDPs:^ 

Packing SDP: maXa;>o ■■ , (1.1) 

Covering SDP: miny^o {Tr(y) : Aj • P > 1 Vi G [n]} . (1-2) 

Since the two programs are dual to each other, let us denote by OPT the optimal value to both 
of them. Also, let x* be any optimal solution for the packing SDP (1.1). We say that x > 0 is a 
(1 — e)-approximation to the packing SDP if XiAi ■< I and > (1 — e)OPT, and P ^ 0 a 
(1 -|- e)-approximation to the covering SDP if Aj • P > 1 for all i G [n] and Tr(y) < (1 -|- e)OPT. 
In this paper, we assume without loss of generality that 

™iiiie[n]{||^*||spe} = 1 where ||Aj||spe is the spectral norm of A* , 

since otherwise one can scale all Aj by a constant factor, and the solution OPT as well as x* are 
only affected by this same constant factor. We denote by A = (Ai,..., A„). 

History. Positive SDP instances have been used to model a large numer of computational prob¬ 
lems, such as Max-Cut [14, 20], sparse PCA 14], coloring 14], the ARV relaxation of Spars- 
estCut [13] and BalancedSeparator 7. 28], and many others. Positive SDPs also found 
application in computational complexity, where they were crucial in establish the QIP = PSPACE 
equivalence 15], as well as in quantum interactive proofs 16] and quantum zero-sum games 17]. 
In addition, techniques developed in this line of research have also inspired many other important 
results, most notably regarding spectral graph theory [2. 27. 28]. 

While there has been a lot of research on the fast approximate solution of positive LPs [3. 4, 6. 8-- 
12, 21. 23 -26, 32. 36, 37], the more general positive SDP case has lagged somewhat behind. Most 
known positive SDP solvers [5. 7, 13-171 demand a parallel running time that is polylog(nm/e) • 
poly(/9) in order to produce a (1 ±e) approximation of the optimal value. In this expression, p is a 
“width” parameter that depends on the numeric value of the SDP and that can sometimes be as 
large as poly(n, m). 

^The most general form of covering SDP can be written as follows. Given m x m PSD matrices C,Ai,..., An, 
and non-negative scalars bi,... ,bn, a general covering SDP is to 

minimize G • P subject to the constraint that Ai • Y > bi for each i G [m] and P P 0. 

It is a simple exercise, but anyways proved in [30, Appendix A], to see that the above general form can be easily 
translated into our standard form. This is also true for packing SDP. 
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Problem 

Paper 

Parallel Depth Per Iteration 

Number of Iterations 

p/c LP 

[231 

log(nTO) 

log^ (nm)/e^ 

p/c LP 

[41 

log(nTO) 

log^ (nm)/e^ 

p/c SDP 

[181 

polylog(nTO) • poly(l/e) 

log^^(nTO)/e^^ 

p/c SDP 

[19. 301 

log^ {nm) / e 

log^(nTO)/e^, in doubt “ 

p/c SDP 

[this paper] 

log^(nTO)/e 

log^(nm)/e3 ^ 


Table 1: Comparisons of asymptotic running times among width-independent approximate solvers 
for positive LPs and SDPs. Notice that each iteration of a SDP solver requires a 1/e-dependance 
to approximate the matrix exponential using the Johnson-Lindestrauss Lemma [30]. 


“See Section 2 for details. 

*'The present paper only discusses the algorithm that converges in log^ (nm)/e^ iterations. It can be improved to 
log^(nm) log(l/e)/e^ for positive SDP using exactly the same technique provided by Wang et al. [33] for positive LP. 
We shall include a detailed proof of this in a next version of this paper. 

In a seminal work in 1993, Luby and Nisan 231 introduced the first width-independent and 
polylogarithmic-parallel-time positive LP solver. Based on this breakthrough, in 2011, Jain and 
Yao [181 proposed the first approximate positive-SDP solver that is width-independent and whose 
parallel running time is only poly(logn, ^). In fact, their algorithm is a faithful generalization of the 
positive LP solver of Luby and Nisan [231 to positive SDPs. Although the convergence rate (i.e., 
number of parallelizable iterations) required by Luby and Nisan’s algorithm is only 0(log^(nm)/e^), 
the convergence rate of Jain and Yao’s is as large as 0(log^^(nm)/e^^) (see Table 11. This significant 
loss in the running time stems from the harder task of computing with matrices and in particular 
by the loss of commutativity in matrix algebra with respect to the vector setting. 

The poor theoretical performance of [181 has attracted some researchers to study alternative 
positive-SDP solvers. Motivated by Young’s algorithm [361 for positive LPs, two alternative solvers 
have been proposed 19. 301 • However, the theoretical convergence of these two new solvers remains 
unclear, as the correctness of both convergence analyses has been called into question. The issue 
with the algorithm of [301 is explicitly stated in the latest ArXiv version of that paper 311. A 
similar issue has been identified 29, 35l with the proof of [191. ^ nutshell, the proof difficulties in 

both works arise because Young’s algorithm, in its current form, relies on a monotonicity argument. 
While such monotonicity holds naturally in the vector (i.e., LP) case, it does not generalize to the 
matrix (i.e. SDP) world. See Section 2 for a detailed discussion of this. 

As a result, the best parallel running time of width-independent positive SDP solvers remains 
to be 0(log^^(nm)/e^^) due to Jain and Yao [181. 

This Paper. In this paper, we present an algorithm PosSDPSolveri A, e) that runs only in 
Q ^ log n-\oz{nm/£) ^ xhis matches the best convergence rate of the width-independent par¬ 

allel positive LP solver 41, and is a significant improvement over the best known width-independent 
positive SDP solver by Jain and Yao [181. It is also an improvement over the solvers of 301 and 
[191, even if their analyses can be fixed. (See Table 1.) 

Our algorithm is also much simpler than all the previous width-independent positive SDP 
solvers, as it avoids the use of “phases” and restarts that are required by previous solvers [18, 19, 301. 
Our algorithm is simply divided into n-\og{nm/e) ^ Starting from some initial vector 

X > 0, in each iteration, we compute n matrix exponential computations Ai • e^,... • e^ 

in parallel for some symmetric matrix T satisfying HTjlspe < 0(log(nm)/e), and then change Xi 
according to the value of Ai»e^. This same algorithm simultaneously produces l±0(e) approximate 
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solutions to the packing SDP (1.1) and the covering SDP (1.2).. 

We remark here that, as originally put forward by Arora and Kale 7], and then formally 
established by Peng and Tangwongsan [30], each of our iterations can be implemented to run 
in 0(log^(nm)/e) parallel time after some simple preprocessing. In fact, such computations are 
required by all the previous width-independent positive SDP solvers. 

Our Techniques. Our algorithm is directly based on the optimization framework of the positive 
LP solver recently put forward by Allen-Zhu and Orecchia 4]. The non-commutativity introduced 
by matrices creates significant obstascles and technical challenges that have forced us to make both 
our algorithm and analysis different from 4]. 

To begin with, just like the result in 4], we interpret the positive SDP problem as a purely 
optimization question, i.e., to minimize f{x) for some convex function / = that is an SDP 
extension over its LP choice proposed in 4). In each iteration of our algorithm, we compute 
the coordinate gradient Vif{x) Aj • — 1 for each i G [n]. 

An Old Story. In [4], the authors update each Xi as follows. They first define the truncated 
gradient by letting be essentially min{l, Vi/(x)} Next, update each Xi <— Xi ■ for some 

global parameter a = 0(e^/log(nm)) > 0. 

The key idea behind the convergence result of [4] is that, if one changes x according to the rule 
above, then for each “important” i G [n] (i.e., coordinates i satisfying Vif{x) 0 [—e,e]), we have 
that Vif{x) is guaranteed to change multiplicatively within a factor of 1 ± ^ x changes, and 
therefore the sign of Vj/(x) for each important i remains the same before and after each update. 
This leads to the conclusion that the objective value f(x) effectively decreases during each iteration. 

Unfortunately, this “multiplicative-change” guarantee, which is a crucial component of most 
width-independent solvers, is false in the SDP setting. 

Our New Ideas. In this paper, we make two important observations. First, suppose for a moment 
that X is updated in a sign-consistent manner: either it non-decreases or it non-increases for all the 
coordinates. Even under this sign-consistent assumption, Vi/(x) does not necessarily remain of 
the same sign for each important coordinate i, so the previous analysis of 4] still fails in the SDP 
setting. However, under this sign-consistencty assumption, we can show that a carefully chosen 
weighted summation of Vj/(x) does maitain the same sign. This consideration is sufficient to prove 
that the objective signficantly decreases at every iteration. To show that the weighted summation 
remains of the same sign, we require a generalization of the Lieb-Thirring inequality. To the best 
of our knowledge, this is a new matrix inequality, which may be of independent interest. We shall 
discuss the relation between our generalizaiton of Lieb-Thirring and positive SDPs in Section 2. 

Finally, to ensure that x is updated in a sign-consistent manner, we introduce randomness as 
follows. We flip an unbiased coin at each of our iterations, and choose to either update x^’s in a non¬ 
decreasing manner (therefore ignoring all coordinates i with Vj/(x) > 0), or in a non-increasing 
manner (therefore ignoring all coordinates i with ’Vif{x) < 0). Such a random choice can be shown 
to decrease the objective /(x) well in expectation, but adds a lot difficulty to the analysis of the 
covering SDP. In short, after such randomness is introduced, the old analysis of [4] only gives a 
solution Y whose expectation E[y] is feasible to the covering SDP (1.2): that is, Aj • E[y] < 1 
for each i £ [n]. Such a result is totally useless because we need Ai •¥ < 1 for each i £ [n], and 
therefore we need to propose a totally different analysis that bypasses this difficulty (see Section 6 i. 

Conclusion. In this paper we show that the positive LP solver by Allen-Zhu and Orecchia 4] 
can be extended to the SDP setting without any asymptotic loss in the convergence rate. 

^There is an optimization insight behind why such a truncation is needed and we refer the interested readers to 
the introduction of [4]. 
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At a high level, to convert any positive LP solver to SDP, one needs to tradeoff between 
(a) “what is allowed to be changed in the algorithm without hurting its performance” and (b) 
“what must be changed in order to work with matrix algebra”. In this paper, we make use of the 
optimization framework of [3], which gives us the greatest degree of freedom in (a), and prove a new 
matrix inequality that gives us a better understanding of (b). Together, these techincal advances 
lead to a width-independent, parallel, simpler, and faster solver for positive SDPs. 

1.1 Roadmap 

We introduce our new matrix inequality and discuss about its connection to positive SDP in 
Section 2. Next in Section 3 we describe our algorithm PosSDPSolver. In Section 4. we define 
an objective //^(x) and relates it to positive SDP. In Section 5 and Section 6 respectively, we de¬ 
scribe the convergence analyses for the packing and the covering SDPs. 

2 Some False and Some True Inequalities in Matrix Algebra 

We denote hy A* B = TA[AB) = Ax[BA) the matrix inner product, and by ||A||spe the spectral 
norm of a matrix A. If X is symmetric, we use to denote its matrix exponential. We write 
A ^ 0 if A is positive semidefinite (PSD), and A^l?ifA — 

Some False Matrix Inequalities. The following is the SDP version of a fundamental inequality 
that the positive LP solver of 4] relies on: for every symmetric matrix T and every i G [n], 

Aj • = (1 ± 0{e)) • Aj • e'^ if —el :< B < el . (2T) 

Unfortunately, this inequality is false in the general SDP case. It is straightforward to check that 
it holds when all matrices involved are diagonal. 

Similarly, here is another SDP inequality, whose LP version is crucial to to many positive LP 
solvers [8-TO, 36. 37]. It is the following monotonocity statement; for every symmetric matrix T 
and every i G [n], 

Aj • > A, • e'^ if i? ^ 0 . 

However, this inequality is again false. 

Unfortunately, these false matrix facts have found their ways in the positive SDP solvers pro¬ 
posed in 19. 30]. It is not clear at this point if these analyses can be fixed [29, 351,^ Both the 
inequalities above become true if T and B commute. This is precisely why the aforementioned 
positive LP solvers are correct. 

Our New Approach. In this section, we shall prove that 

B • e^~^^ = (1 ± 0(e)) ■ B • e'^ as long as el ^ B ^ 0 or —el :< B ^ 0. (2-2) 

This non-trivial matrix inequality holds even if B and T are not commutable, and shall become 
important for our later proofs in Section 5.1, We shall prove this by first establishing an interesting 
extended form of the Lieb-Thirring inequality. 

In 1976, Lieb and Thirring 22] proved that for every A,B ^0 and every r > 1, it holds that 
Ti[(B^OB^O)r < This inequality is known as the Lieb-Thirring inequality 

^The ArXiv version [31] of the paper of Peng and Tangwongsan [30] acknowledges the error. The error in the 
analysis of [19] lies in the proof of Lemma 8, where they use the fact that “localjfa;) only increases”. This is an 
instantiation of the second false inequality above. 


4 




and is famous for its applications in quantum mechanics and differential equations. Very recently, 
Allen-Zhu, Liao, and Orecchia have connected it to the online matrix optimization problems 21. 

In the special case of r = 2, the Lieb-Thirring inequality says that < 

"Ti{BAB). In this paper, we establish the following generalization of the Lieb-Thirring inequal¬ 
ity, which turns out to be crucial for the convergence analysis of our positive SDP solver. To the 
best of our knowledge, this inequality has not appeared in the literature. 

Lemma 2.1 (Extended Lieb-Thirring Inequality). Given A y 0, B ^ 0 and a G [0,1], we have 


Unlike the original proof of Lieb-Thirring inequality which relies on Epstein’s concavity theorem, 
our proof of Lemma 2.1 relies on Lieb’s concavity theorem: 


Proposition 2.2 (Lieb’s concavity theorem). For all m x n matrices K, and all q,r such that 
0 < (7 < 1 and 0 < r < 1, with q + r <1, the function F{A,B) =' Tv{K^A^KB^ ) is jointly concave 
over [A, B), where A (resp. B) is over the set of allmxm (resp. nxn) positive definite matrices. 

Proof of Lemma 2.1 , The inequality is obvious when a = 0 or a = 1, and therefore we shall assume 
without loss of generality that a G (0,1). In addition, we can assume without loss of generality 
that B is diagonal: otherwise, one can apply an orthogonal transformation to make B diagonal. 

Let us write A = A^ -|- Af, where A^ is the diagonal part of A, and AP is the off-diagonal part 
of A. Define A\ =' A^ + = \A -|- (1 — \)A^. It is clear from this definition that Aa ^ 0 for 

all A G [0,1]. In fact, we notice that A >- 0 implies A^ is positive in all of its diagonal entries. As 
a consequence, there exists some constant e > 0 such that Aa 0 even for all A G [—e, 1]. 

Now, consider two matrix-to-real functions g{A) • B^!"^AA~'^B^I‘^ and h{A) 

Av{BAB). Since g{A) = Fi{BA°‘BA^~'^), Lieb’s concavity theorem (cf. Proposition 2.2) implies 
that g{A) is concave in A (over the positive definite cone). In contrast, h{A) is simply a function 
that is linear in A. Therefore, R{\) ^(^a) ~ h{A\) is defined and concave over A G [—s, 1], and 

Lemma 2.1 is equivalent to saying that i?(l) < 0. 

We begin analyzing i?(A) by noticing that i?(0) = fl'(Ao) — /i(Ao) = 0: this is a simple conse¬ 
quence of the fact that B, being a diagonal matrix, commutes with Aq = A^. Therefore, combined 
with the concavity of R{X), to prove i?(l) < 0 it suffices to prove that R{X) is differentiable at 
A = 0 and R'(0) = 0. 

Eirst of all, Mi (A) (^a)°' is differentiable at A = 0 and its derivative at A = 0 has zero 

diagonal entries. Indeed, using the representation Mi(A) = ^csc(a 7 r) ' /o^ x°‘~^ ■ Aa(Aa -|- xl)~^dx, 
one can verify that. 


dMi(A) 

dX 


A=0 


1 

IT CSc{a7T) 
1 

TT csc(a7r) 




+ xl) (Aa -|- xI) 


dx 

A=0 


(a 0(A^ + x/)-i - A^(A^ + xiy^A^A^ + xl)-^yx . 


Noticing in the above equality A^ is a matrix with zero diagonal entries, while (A^ -I- xl) ^ and 
A^(A^-|-x/)“^ are both diagonal matrices. Therefore, M((0) is a matrix with zero diagonal entries. 

Similarly, defining M 2 (A) = (Aa)^““ we have that M 2 (A) is differentiable at A = 0 and M2(0) 
is a matrix with zero diagonal entries. 
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Algorithm 1 PosSDPSolven A, e) 


Input: A = (Ai,..., An) where each Ai G 
Output: nonnegative vector x G 
anda^ f. 

^ all i £ [n]. 


is PSD, and e G (0,1/10]. 


>Q and PSD matrix Y G 


2: X, 


3 

4 

5 

6 

7 

8 
9 

10 

11 


^ ^ll^illspe 

j, ^ 81og(2n) 

for k 

Randomly choose to be either T_ or T+, each with probability half. 

for i ^ 1 to n do 


ae 

- 0 to T — 1 do 


o parameters 
o initial vector x^^'i 
o number of iterations 


-(Y- 


1 I') 


fk) _ a-Y^){vi) 


Compute the feedback Vi G 
Perform an update: •(— x- 

end for 

end for _ 

return and where Y Y^JYq Y{x^^'>). 


Ai - 1 


o recall that Y(x) = 


Finally, we can compute that 


R'(0) = 


dA 


d{B‘^ • Ax) 


A=0 


dX 


A=0 


= • Rl/2(^D^l-a^l/2 Qll2^D^aQll2 , ^ 1/2 (0)5^/^ - , ^0 ^ 


Clearly, this means R'{0) = 0 because M{(0), M2(0) and A^ are all matrices with zero diagonal 
entries, and B and A^ are diagonal matrices. CH 


Our extended Lieb-Thirring inequality immediately yields the following monotonicity property 
on matrix exponential, which is a formal statement of (2.2), Its proof is deferred to Appendix A, 

Lemma 2.3. Given PSD matrix A satisfying el Y A Y 0 and symmetric matrix 'k, define function 
f{t) =' A • over real values t. Then, 0 < f{t) < eA • e^^^^ = ef{t) for all t. As a result: 

(a) f{t) < /(O) • for all t > 0, and 

(b) f{t) > /(O) • for all t < 0. 


3 Our Algorithm 

Our algorithm PosSDPSolver(A, e) r un s only in T = ) parallelizable iterations. We 

iteratively update x so as to maximize l^x, while keeping the approximate feasibility XiAi < 

(1 + e)I. At each iteration k, we compute a feedback vector v so that Vi = i g 

[—1, oo), and perform a multiplicative update Xi ^ Xi- Here, T(-) is randomly chosen (for 

each iteration k) as either T_ or T+, defined as follows: 

Defiuitiou 3.1. The thresholding functions T_,T+: [—l,oo) —)> [—1,1] are defined as follows 


T_{v) = 


0, u G [—e, oo); 
v, v€[-l,-e). 


and '!'+('(;) = 


0, uG[-l,e]; 
v, uG(e, 1]; 

1, u>l. 


Note that if T = T_ then the variables of x monotonically non-decreases, and vice versa. 
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Remark 3.2 (Matrix Exponentials). Matrix exponential computations are required by all width- 
independent positive SDP solvers, and dominate the complexity of each algorithmic iteration. Like 
in previous solvers, it is a simple exercise to verify that our entire analysis in this paper continues to 
hold, though with a worsen constant, if we are only computing the values Vi = • Ai 

up to a l±e/2 multiplicative factor. Therefore, for simplicity’s sake, in this paper we assume that 
the matrix exponentials can be computed exactly. Note that the l±e/2 approximate computations 
of • Ai for all i G [n] can be performed in polylog parallel iterations.^ 

We summarize our theorem as follows. 

Theorem 3.4 (Positive SDP). Letting (x,Y) = PosSDPSolveri A, e), we have that with at least 
a constant probability 

• X is a {1 — 0(e))-approximate solution for the packing SDP (1.1). 

• Y is a (1 + 0(e))-approximate solution for the covering SDP (1.2). and 

• the number of iterations for PosSDPSolver is T = Oilogn ■ \og(nm/e) ■ £~^). 

If each Ai = QiQf is preprocessed into its Cholesky decomposition, each iteration can be imple¬ 
mented in 0{log^{nm)/e) parallel depth. 


4 The Convex Objective 

We define the following convex objective for the positive SDP problem. It is completely analogous 
to its LP variant introduced in [4], and therefore we state its properties without proof. 

Definition 4.1. Letting parameter p, i\og(nm/e) ’ define the smoothed objective fii{x) as 

ffiix) tJ. ■ Tr(ei"*'^*^W xiAi-i)^ _ ^ 

We want to study the minimization problem on fii{x) over all x > 0. This objective /^(x) 
captures the packing SDP because, on one hand we want to minimize —l^x so as to maximize 
l^x, and on the other hand the exponential penalty function says if X]ie[n] ^ (1 + £)I is 
violated, a large positive penalty is introduced. 

Proposition 4.2. 

(a) OPT G [l,n]. 

(b) Letting x = (1 — e/2)x* > 0, we have //^(x) < —(1 — e)OPT. 

(c) Letting x^^'l >0 be such that xf^ = for each i G [n], we have ff_iix^^^) < — 

(d) For any x > 0 satisfying f^(x) < 0, we/laxe x^Aj ^ (l-|-e)/ and thus t'^x < (l+e)OPT. 

■‘More precisely, when each Ai = QiQf is presented in its Cholesky decomposition, we have 

Theorem 3.3 ([30]). Given an m x m PSD matrix with p non-zero entries and ||‘l’||spe < k, and given m x m 

matrices {Ai,..., A„} in the form of Ai = QiQf where the total non-zero entries across all Qi is q. Then, there 

exists an algorithm that computes e* • Ai for all i £ [n] up to a {1 ± e) factor in 

max {k, log-} logm + loglogm^ depth and ^ max {k, log-} • p + log work 

Since one can verify that |j‘h||spe < k == 1/p = 0{log{nm/e)/e) in our case, each iteration of PosSDPSolver can be 
implemented to run in 0(log^(nm)/e) parallel time. (Here, we can safely assume that e > if e is smaller 

than one should use for instance Interior Point Method to solve the given SDP instead.) 
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(e) If X > 0 satisfies f^{x) < —(1 — O(e))0PT, then j^x is a {1 — 0(e))-approximate solution 
for the packing SDP. 

(f) The gradient of ffj_{x) can he written as 

Vfij_{x) = (Ai •Y{x),..., An •Y(x)) — 1 where Y{x) = 

5 Convergence Analysis for Packing SDP 

Throughout this paper, we use superscript x^^'l to represent vector x at iteration k, and subscript 
Xi to represent the i-th coordinate of vector x. Our convergence analysis is divided into three steps, 
and the hrst step is the main technical difference between this paper and its LP variant 41. 

Step I: Gradient Descent. We interpret (see Section 5.1 for details) each update •(— 

g gradleut descent step,^ and show that the objective f^{x) monotonically 
decreases between consecutive iterations: 

Lemma 5.1 (Gradient Descent). For every iteration /c = 0,... ,T — 1 in PosSDPSolver, the ob¬ 
jective f^(x) does not increases: /^(x^^^) —/^(x^^^^^) > 0. Combining this with Proposition j.2.c. 
we have f^(x^^^) < 0 for all k. 

In addition, letting C [n] be the set of indices i such that Vj/^(x^^^) > 1, then 

^ A' ■ £ 0 ■ 

Above, the expectation is over the random choice at iteration k. 

We remark here that Lemma 5.1 does not follow from any classical theory of gradient descent 
because our objective fp,(x) is simply not smooth in the positive orthant. Neither does Lemma 5.1 
follow from the so-called “multiplicative Lipschitz gradient property” introduced in 41, because 
the fundamental property that the work [41 replies on, “Vi/^(x) increases as x decreases, and vice 
versa”, no longer holds in the SDP case. This is also one of the major reasons that the results 
of 19. 301 fail to produce any theoretical guarantee. 

Our proof of Lemma 5.1 crucially relies on two key properties. First, the sign-consistent and 
random choice of ensures that x either only increases or only decreases at a single iteration k. 
Second, our new matrix inequality introduced in Section 2 ensures that ^^^ifij,(x) increases in an 
average sense as x decreases”. We defer the technical proof of Lemma 5.1 to Section 5.1, 

Step II: Mirror Descent. It is not hard to show, and in fact proven in [4] for a slightly different 
variant, that each update ■(— x-^^ • can also be viewed as a mirror-descent step. 

A mirror descent step in optimization is any step from x to x' that is of the form x' •(— 
argminj,{ 14 ( 2 ;)-)-(aV/(x), z — x)}. Here, a > 0 is some step length, and Vx(x) = w(x) — {Vw(x),x — 
x) — w(x) is the Bregman divergence of some convex distance generating function w(x). In this 
paper, we pick w(x) '= X]ie[n] ^ogXi — Xi to be the generalized entropy function, and accordingly, 

for every x, x > 0, 14 (x) = (x* log -h Xj - Xj) . 

The next lemma easily follows from the general theory of mirror descent. Since its proof has 
essentially appeared in [4, Lemma 3.3], we prove it in Section B.3 only for the sake of completeness. 

®To be clear, in some literature, the gradient descent is referred only to x ■<— x — c • V/(x) for some constant c. In 
this paper, we adopt the more general notion, and refer it to any step that directly decreases I{x). 





Lemma 5.2 (Mirror Descent). Letting 7 G [—1,1]"' be defined as 7 * = T(Vi/^(x^^^)), we have 
that for any u >0, 

{aj, -u) < a^OPT + V^(k) (n) - V^^^+i) (n) . 

Step III: Coupling. Finally, as formally argued in Section B.2. the two lemmas above can be 
naturally combined, yielding the following bound: 

Lemma 5.3 (Coupling). For any u >0 and k = 0,... ,T — 1, we have 
- U{u)) < - u) 

< 4(/^(x(^)) - E[/^(x(^+i))]) + 2(l4(,)(n) - E[l^,(fe+i)(u)]) + a • 2eOPT + a • el^u . 

Above, the expectation is over the random choice at iteration k. 

The proof of Lemma 5.3 relies on a decomposition of the gradient Vi/^(x^*^^) into four components 
+ C + Vi + Co where 4+ G [0,1], f,~ G [-1,0], ry G [0,oo), and (i G [-e,e]- This 
is a main difference that distinguishes our proof from [4]: we need to decompose the part into a 
positive and a negative terms, and then apply Lemma 5.2 twice. 

Putting All Together. By telescoping the inequality in Lemma 5.3, one can obtain the following 
final theorem for packing SDP. Its proof is only slightly different from that of [4, Theorem 3.5] due 



5.1 The Gradient Descent Lemma 

In this subsection we view our update j.(fc+i) g^g g gradient-descent step and prove Lemma 5.1. 

We begin by observing that each Xi is changed by a factor of at most 1 ± 4 q;/ 3 per iteration: 

Fact 5.5. We always have G xf^'^ ■ [1 — 4a/3,1 -|- 4a/3]. 

Proof. We can always write = xf^'^ ■ e* for some t G [—a, a] C [—1/4,1/4]. According to the 

fact that e* < 1 -|- 4t/3 for t G [0,1/4] and e*>l — t>l — 4f/3 for t G [—1/4, 0], we must have 
G xf ^ • [1 - 4a/3,1 -h 4a/3]. □ 

Proof of Lemma 5.1 We prove by induction. Suppose that Lemma 5.1 is true for all indices less 
than k. This implies, in particular, that /^(x^^^) < /^(x*^^“^)) < • • • < ffj.{x^^^) < 0. 

There are two cases to consider at iteration k: (1) if we choose T_(-) and (2) if we choose 'ir_|_(-). 
Each of them happens with probability 1/2. 

In the hrst case, that is, if we choose T_(-), we have the property that our vector does not 
decrease: that is, > x^^ for every i G [n]. We compute the objective difference by the 
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standard integral over gradients: 


= j 

jT^(fc+l) _ -iT^{k) J ^^jiiT.ieln]4 ^ ^i-I+-^T.ieln]i4 ) Ai'j dr 


= lT^(k+i) _^Tx{k) _ ^ [ , ( 5 . 1 ) 

Jo 

where in the last equality we have defined 'h ^(Sie[n] xf^'^Ai — I) and B = “ 

xf Vi ^ 0 . 

Notice that < 0 together with Proposition 4.2.d tells us that XlieM ^ (1 + 4^- 

Combining it with Fact 5.5 we have X]ie[n] — x\’^^)Ai ^ ^(1 + e)I ^ and therefore 

B :< = ||I. Applying Lemma 2.3.a with B ^ ||/ to (5.1), we have 


U { x ^’"4 - ^ Bme '^ ■ 

Jo 

> - (1 + e/A)^B • e'^ . 


Recall that, for each i G [n] satisfying 4 x^^\ we must have e'^ • A, — 1 < —e by the 

definition of ’!’_(•). Therefore, multiplying both sides by > 0 and summing up over 

i G [n], we obtain 

fiB • • (^(x-^"'"^^ — Xj-^^)Aj) < (1 — e)(l'^x^^^^^ — l^x^^^) . 

i&[n] 


This further implies that (after some careful term rearranging) 

“ (1 + e/4)/ri? • e'^ > |(l^x*^^'''^) — — ^B • e'^) 

= |(V^(xW),xW - x^^'+i)) > 0 . 

Above, the last inequality is again by our definition of T_: for each i G [n] satisfying xj^^ ^ xV^\ 
it must satisfy that Vj/^(x(^)) < —e and x-^^ < x^^^- conclusion, we arrive at the inequality 

Mx^’^4 - > ^(V^(xW), xW - x('^+')) > 0 . 

In the case when T+ is chosen, a symmetric argument (although replacing the use of Lemma 2.3.a 
with Lemma 2.3.b and using slightly different constants, see Appendix B.l i yields that 

Ux^'^4 - > l{^U{x^"4,x^^^ - x^^+^4 

> \ J2ieBW Vi/;.(xW) • (xf ^ - xV^^) . 

Above, the second inequality is because for each i G [n] satisfying xf^'^ ^ xV^\ must satisfy 
that Vj/^(x^^^) > £ and x|^^ > Next, observe that for each coordinate i G B^^'^ we have 
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■ e “ < (1 — 0.9a)x\^^ for our choice of a. Plugging this into the inequality above, we 
arrive at the inequality 

^ • 0.90 • xf ^ ^ f X] ^ 0 • 

i&BW i&B(’‘') 

Finally, combining the two cases above, we conclude that 

- E[/„(x(‘+‘>)] > ^ E,6SI« V./„{xI‘)) ■ xf > . □ 

6 Convergence Analysis for Covering SDP 

We have seen in Section 5 that a vector x > 0 satisfying f^{x) ~ —OPT yields an approximate 
solution to the packing SDP (1.1), However, this vector x itself gives no information about the 
solution to the covering SDP (1.2). 

In this section, we show that, defining Y YlJ=o where Y{x) '= 

then is a (1 + 0(e))-approximate solution to the covering SDP (1.2) with at least a constant 
probability. Therefore, PosSDPSolveri A, e) is an algorithm that simultaneously solves both the 
primal and the dual side of the positive SDP problem. 

Our proof can be divided into two parts. First, using similar proof techniques as in 4], one 
can show that Y satisfies the approximate optimality^ at least in an expected sense. We prove this 
lemma below in Appendix C only for the sake of completeness. 

Lemma 6.1. For any T > -^ = y;g have that E[Tr(y)] < (1 + 7e)0PT. 

In the second part, we wish to show that Y satisfies the approximate feasibility as well, that is, 
Ai»Y < 1 + 0(e) for all f G [n]. However, we encounter two difficulties: 

• First, a similar analysis as in [4] would only imply that the expected matrix E[y] satisfies 
such approximate feasibility, rather than Y. By Markov’s inequality, this only suggests that 
for each ( rather than for all ) i G [n], Ai • Y < 1 + 0{e) holds with constant probability,® 

• Second, the analysis in 4) does not directly imply that Y is approximately feasible. Instead, 
one has to modify y in a non-trivial manner which is very unpleasant in practice. 

Due to the above difficulties, we propose in this paper a fundamentally different, yet much simpler 
analysis for proving the approximate feasibility. This is deferred to Appendix C, 

Lemma 6.2. For any T > with probability at least 1 — we have Ai»Y > 1 — 2e for all i G [n]. 



^Previously, the first and third authors of this paper have tried to bypass this difficulty using a dual smoothed 
objective in the LP case [3]. However, their analysis is more involved and loses a factor of ® in the running time. 
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Appendix 

A Missing Proofs for Section 2 

We need the following chain rule for the derivative of matrix exponential: 

Proposition A.l ([34]). If X{t) is a differentiable function from reals to symmetric matrices, 



Proof of Lemma 2.3 According to Proposition A.l. we have 

Ja=0 

Suppose further that A = PP^. Then, we can write 

f'{t) = j 

However, since p p q and p'^p p q, we conclude that p'^p • 

pT^(i-a)('i!+tA)p > Q therefore f'{t) > 0 for all reals t. 

Next, applying Lemma 2.1 we have that 

f'{t) = f j' Tr do = < eA . . 

□ 

B Missing Proofs for Section 5 

B.l The Gradient Descent Lemma 

In this section, we provide the detailed analysis of the symmetric case (i.e., when T_|_ is chosen) in 
the proof for Lemma 5.1, 

Notice that < 0 together with Proposition 4.2.d tells us that X]ie[n] ^ (1 + ^)I- 

Combining it with Fact 5.5 we have X]ie[n] A P therefore 

0 P P P ~ Applying Lemma 2.3.b with 0 P P P —f§L to (5.1), we have 

Jo 

> - (1 - e/4:)iaB • e'^ . 
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Recall that, for each i G [n] satisfying 7 ^ xp\ we must have • A, — 1 > e by the 

definition of T+(-). Therefore, multiplying both sides by — x^^^ < 0 and summing up over 

i G [n], we obtain 

/iR • (x-^"*"^^ — x\^^)Ai) < (1 + — l^x*-^^) . 

ie[n] 

This further implies that (after some careful term rearranging;'^ 

_ tT^(k) _ _ ^T^(k) e^) 

3 

= ^(V^(x('^)),x('^)-x('^+i ))>0 . 

Above, the last inequality is again by our definition of T_: for each i G [n] satisfying 

it must satisfy that Vj/^(x*'*'^) < —e and xf^^ < In conclusion, we arrive at the inequality 

> ^(V^(x('^)), x('=) - x('^+')) > 0 . 

B.2 The Coupling Lemma 

The main idea in our proof to Lemma 5.3 is to divide the gradient vector V/(x) G [—l,oo)"' into 
four components, the component containing large coordinates (i.e., bigger than 1 ), the component 
containing positive small coordinates (i.e., in (e, 1 ]), the component containing negative small co¬ 
ordinates (i.e., in [— 1 ,—e)), and the component containing negligible coordinates (i.e., in [—e,e]). 
The large gradients are to be taken care by the gradient descent lemma, the small (positive and 
negative) gradients are to be taken care by the mirror descent lemma. Formally, 

Proof of Lemma 5.3. By convexity, the distance /^(x^^^) — fp,{u) for an arbitrary u > 0 is upper 
bounded as follows: 

a{ff,{x^’^'>) - ff,{u)) < (aV//,(x('')),x(^) - u) 

= {ar]^^\x^^'^ — u) + x*-^^ — u) + x^^'^ — u) + {aC,^^\x^^'^ — u) , 

(B.l) 


where 

• 4^"-^ =T_(VJ^(x(^))) G [-1, —e) is the truncated gradient, capturing small negative coordinates. 

• =T+(V,^(x(^)))G(e,l] is the truncated gradient, capturing small positive coordinates. 

• 3^"^ I ^ ) - 1’ I g [0,oo), capturing the large coordinates. 

^ ^{k) I Vi/^(x ), ^ ^ I G [— e,e], capturing the negligible coordinates. 

^Indeed, gB • e® < (1 + implies that (1 — 3e/4) • gB • e® < because 

both sides are nonpositive and 1 — 3e/4 > for our choice of e. Multiplying both sides by 1/3, we have that 
(1/3 — e/4) • gB • e’*' < (1/3) • [tP— iP This is now equivalent to ~ (1 ~ e/4)gB •e^ > 

|(l^x('“+^> - - gB» e'^). 
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We analyze the four components of (B.l) one by one. 

The C component is small: if f^{u) < 0, we have 

— u) < ae ■ u) < ae • (1 + e)OPT + ae • u (B.2) 

where the last ineqnality is because < 0 from Lemma 5.1, 

The T] component can be upper bounded with the help from Lemma 5.1 as follows. Note that 
7^ 0 only if i G (where recall from Lemma 5.1 that is the set of indices whose 

Vi^(xW) is no less than 1). In particnlar, if i G we have — 1 < 

and thns Lemma 5.1 gives 


4(^(xW)-E[/4x(^+i)))] 

a 




=> - u) < {ar]^^\x^^'^) < - E[/^(x(^+^))]) 

Finally, the ^ components are upper bounded by Lemma 5.2 as follows. Letting 7 = if 

’]p{fc) = qr-, and 7 = if = ir+, we have that 

-u) + -u) = 2 E[(a 7 ,x(^) - u)\ < 2 a 20 PT + 2 E^(fe)(u) - 2 E[E^(fc+i) (u)] , 

where the expectation is over the random choice of T at iteration k. 

Together, we obtain 

a{f^{x^^^) — fn{u)) < {aif]^^\x^^'^ — u) + — u) + {aC^^\x^^'^ — u) 

< 4(/^(x(^)) - E[/^(x(^+^))]) + 2a20PT + 2F^(fc) (u) - 2E[F^(fe+i) (u)] + ae ■ {I + e)OPT + ael'^n 

< 4(/^(x(^)) - E[/^(x(^+i))]) + 2{V,w (u) - 2E[F^(.+i) (u)]) + a • 2eOPT + a • el^u . □ 


B.3 The Mirror Descent Lemma 


In this subsection, we are going to view onr step x^^'^ —)• as a mirror descent step, and 

prove Lemma 5.2, We emphasize that this subsection is included in this paper only for the sake of 
completeness: it is almost a simple replication of the proof of [4, Lemma 3.3]. 

Recall that Tr(^)(Vj/^(x(^))) G [—1,1] is the truncated gradient at step k, and satisfies 

that for all coordinates i such that G [—1,1] \ [—e,e\. We can verify 

that our careful choice of x^^^ —>■ is in fact a mirror descent step on the truncated gradient: 


Claim B.l. 


(^+1) = arg min | V^^k) {z) + , -z - 

z>0 I 


(B.3) 


Proof. This can be verified coordinate by coordinate, because the arg min function is over all 
possible z >0, where this constraint does not impose any inter-coordinate constraint. 

In other words, by substitnting the definition of V^(k){z), we only need to verify that 


(fc+i) 

x) = arg mm ■ 
2,;>0 


log 


-h xf^ - Zi) + ■ (zi - x)'"^) ^ = argmin{5t(2;i)} . 

2,;>0 


,(fc) 


,(k) 


(k). I <^f 


At this point, the univariate function g(zi) is convex and has a uniqne minimizer. Since the gradient 
■^.g{zi) = log this unique minimizer is indeed Zi = x^-^^ • \ finishing the proof of 

Claim B.l, D 
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After confirming that our iterative step in PosSDPSolver is indeed a mirror descent step, it is 
not hard to deduce Lemma 5.2 based on the proof of the classical mirror descent analysis. 

Proof of Lemma 5.2. We deduce the following sequence of inequalities: 

-u) = + -u) 

^ (a^W,xW - 


< 


y : (<“> ■ (x<‘' - x('-+‘>) - 




|^(fc+l) _ ^{k)i2 


r\ r (^)'i 

zmaxjx^' \xl I 


(^^(^)(^) 


(^^(^) (^) 


(B.4) 


< + (F^(fe)(u) - V;(;=+i)(m)) 

< a^OPT + (n) - (u)) 

Here, ® is due to the minimality of in (B.3), which implies that VV^(fc) + = 0 . 

(D is due to the triangle equality of Bregman divergence: 


Vx, y > 0, (-V14(y), y -u) = (Vw(x) - Vw(y), y - u) 

= {w{u) — w{x) — {'Vw{x),u — x)) — {w{u) — w{y) — {'Vw{y), u — y))) 
— {w{y) — tc(x) — {Vw{x),y — x)) 

= v^{u) - Vy{u) - V,(y} . 

(D is because 14(y) = Vi log I 7 + - y* > E* 2 max{a:i,^a ® 1® Cauchy-Schwarz. © 

is because we have < ^x\^^ owing to Fact 5.5. ® is because we have < |OPT owing 

to Proposition 4.2.d (and /^(x^^l) < 0 from Lemma 5.2 i. CH 


B.4 Proof of Theorem 5.4 

Proof of Theorem S.f. We begin by telescoping the inequality in Lemma 5.3 for A: = 0,1,..., T — 1, 
and choosing u = u'= (1 — e/2)x*, which satisfies < OPT by the definition of x*: 


T-l 


E 


« - fai^)) < 4(/4x(0)) - E[/4x(^))]) + 2(f;(o) (S) - E[V,m («)]) + aT • 3eOPT . 


k=0 


(B.5) 

Above, the expectation is over the randomness of the entire algorithm. Notice that, the second 
term on the right hand side of (B.5) is upper bounded by 


Vxio) (^) - E[f 4 (T) (u)] < 1 /( 0 ) (u) < X] ^ log _ 

i X^ j \ / )/ 

< I'^u ■ log(2n) + 1 < 20PT • log(2n) . 

Here, we have used the fact that Ui < m-xt— since UiAi P P 

\\P^i\\spe 


•4 llspe 


+ 


1 - e /2 

n||A.|| 

spe ' n||A,|| spe 

(B. 6 ) 
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From here, we want to prove that E[/^(x^^^)] < — (1 — 5e)0PT by way of contradiction. Suppose 
not, that is, E[/^(x(^))] > —(1 — 5e)0PT, we have — E[/^(x(’^))] < 0 + (l — 5e)0PT < OPT, 

giving an upper bound on the first term on the right hand side in (B.5), Substituting this and 
(B.6) to (B.5), and dividing aT on both sides, we get 


T—l 

i j;(E[/^(x("))] - f^iu)) < - E[^(x(^))]) + ^{V,,o,iu) - B[V,miu)]) + 3eOPT 


< 


40PT 40PT • log(2n) 
aT aT 


+ 3eOPT . 


Finally, since we have chosen T > the above right hand side is no greater than 

4eOPT. This, by an averaging argument, tells us the existence of some k G {0,1,...,T — 1} 
with E[/^(x(^))] < f^{u) + 4eOPT < —(1 — 5e)OPT (where we have used f^{u) < —(1 — e)OPT 
from Proposition 4.2.b I. However, it contradicts to the hypothesis that E[/^(x ('^))] > -(l-5e)OPT 
because > /^(x*-^^) according to Lemma 5.1. This finishes the proof that E[/^(x m)] < 

-(1 -5e)OPT. 

(T) 

The fact that provides a (1 — 0{e)) approximate solution for the packing SDP is due to 
Proposition 4.2.e and Markov’s inequality which states that /^(x^^^) < —(1 — O(e))0PT with at 
least constant probability. CH 


C Missing Proofs for Section 6 

The proof of Lemma 6.1 is completely analogous to its LP variant in 4]. We include it only for the 
sake of completeness. 

Lemma 6.1, For any T > -^ = have that E[Tr(y)] < (1 + 7e)OPT. 

Proof. Telescoping Lemma 5.3 for A: = 0,l,...,r—1 and tt = 0, we have that 


T-i . 

-e[^(V^(xW),x("))] < - E[/4x(^))]) + — (H^(0)(0) - E[P,m(0)]) 

k=0 


< - E[/4x(^))]) + ^P,(o) (0) + 2£0PT 

< - E[/4x(^))]) + ^ + 2£0PT . 


+ 2eOPT 


(C.l) 



ie[n] 

> (1 - e)I • ^Ai-i) _ jT^{k) _ ^ . (^)4 


nm 


= (1 — e)Tr(y(x*'^^)) — — m • ■ 


nm 


Above, the (only) inequality is because if B = Ai has eigenvalues Ai,..., Am > 0, then 

Sie[n] A-/) _ Xj . e(y-i)/M, However, if there are some Xj satisfying 
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Aj < 1 — e, the corresponding term < e is very small, and there are at most 

m such small terms. As a result, one must have X]je[m] > (1 — e) XljeH — 

m ■ (—)^ = (1 — e)I • A-J") _ , (—)^. 

V nm / v / V nm ^ 

On the other hand, since Ai < (1 + e)I by Proposition 4.2.d, we must have 1 < 

(1 + e)OPT by the definition of OPT, and thus > 0 — (1 + e)OPT. This gives an upper 

bound on the right hand side of (C.l) that is ^^^^OPT + ^ + 2eOPT < 3eOPT, due to our choice 
of T > ^. 

— as 

Together, we deduce from (C.l) that 


T-l 


T 

k=0 

1 1 


(1 - e);^ V E Tr(y(x(*^))) - - m ■ ( —)^ < 3eOPT 

I nm 


E[Tr(y)] = TrE - ^ Y{x^'^'^) < E[l^x(^)] + 4eOPT < (1 + e)OPT + 4eOPT 


where the last inequality is from < (1 + £)OPT for each k (see Proposition 4.2.di. CH 

As mentioned earlier, our proof for Lemma 6.2 below is fundamentally different from its much 
weaker version in [4]. 

Lemma 6.2„ For any T > with probability at least 1 — we have Ai • Y > 1 — 2e for all z G [n]. 

Proof. For each iteration fc = 0,..., T — 1 and coordinate i £ [n], we denote by 

• '= G [— 1 , 1 ] the actual truncated gradient, and 

• '= |(T_(Vi/^(x^^^)) + T+(Vj/^(x*^^^))) G [—1/2,1/2] the expected truncated gradient. 

It is easy to verify that E[ 7 ^^)] = where the expectation is over the random choice of In 

addition, since Vi/^(x(^)) = whenever Vj/^(x(^)) G [—1,1] \ [—e,e] owing to the dehnition of 
the thresholding functions, we automatically have 


IT) (0) _ Y^x-i (fc) 

In the hrst step, recalling that x) ' = xl ■ e T by the definition of our update rule 

I Line 8 of PosSDPSolver i, and recalling that xf^'^Ai ^ (1 + e)I -< 1.5/ due to Proposition 4.2.d 
which implies x,-^^ < -rmf—, we automatically have that for every i G [n], independent of the 
randomness of the algorithm, it always satisfies that 


1 > ^og(^-V(II^^II^P^ • 4°^)) > - log(2n) ^ _£ 

T ^ “ aT - aT - 8 ' 

k=0 


Above, the second inequality is due to our choice of x^^^ and the third inequality is due to our 
choice of T. Next, define (t!^^ we have that {^A:,i}fc=i is a martingale, satisfying 

that E[Zk,i\Zi^i ,..., Zk-i,i] = Zk-i,i and \Zk,i - Zk-i,i\ < 1/2. By the Azuma-Hoeffding inequality, 


we have 


pd/E({. 

/c=0 


T-l 


(fc) A^) 


Ti 


’)<-|]=Pr[% 


' > |] S = 


< 


lOOn 
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By a union bound, with probability at least 1 — e/100, for every i G [n] 


T-l 


T-l 




(k) 


-s = 2 ig«f’- 7 '‘’)+ 2 ^ > 2 .(-j)-j-6 > -2e 


1 ^ „(fc) 


k=0 k=0 k=0 k=0 

In other words, with probability at least 1 — e/100, for every i G [n], 

24 ,. F - 1 = i - 1 )=^ E > -2e • 


□ 


A:=0 


fc =0 
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